Small-world properties of the Indian Railway network 
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Structural properties of the Indian Railway network is studied in the light of recent investigations 
of the scaling properties of different complex networks. Stations are considered as 'nodes' and an 
arbitrary pair of stations is said to be connected by a 'link' when at least one train stops at both 
stations. Rigorous analysis of the existing data shows that the Indian Railway network displays 
small-world properties. We define and estimate several other quantities associated with this network. 
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Given a chance, how would we have possibly orga- 
nized our train travel? People dislike to change trains 
to reach their destinations. Therefore an extreme pos- 
sibility would be to run a single train passing through 
all stations in the country so that no change of train is 
needed at all! An obvious disadvantage in this strategy 
is that the average distance between the stations become 
very large and so also the time needed for travel. The 
other limiting situation would be, to run a train between 
any pair of neighbouring stations and try to travel along 
the minimal paths. This requires a change of train at ev- 
ery station, which is also clearly not economically viable. 
Railway networks in no country in the world follow ei- 
ther of the two ways, actually they go mid- way. Like any 
other transport system the main motivation of railways 
is to be fast and economic. To achieve it, railways run si- 
multaneously many trains, covering short as well as long 
routes so that a traveller does not need to change more 
than only a few trains to reach any arbitrary destination 
in the country. 

In this paper we analyse the structure of the Indian 
Railway network (IRN). This is done in the context of 
recent investigations of the scaling properties of several 
complex networks e.g., social, biological, computational 
networks [Q etc. Identifying the stations as nodes of the 
network and a train which stops at any two stations as 
the link between the nodes we measure the average dis- 
tance between an arbitrary pair of stations and find that 
it depends only logarithmically on the total number of 
stations in the country. While from the network point of 
view this implies the small-world nature of the railway 
network, in practice a traveller has to change only few 
trains to reach an arbitrary destination. This implies 
that over years, the railway network has been evolved 
with the sole aim in mind to make it fast and economic, 
eventually its structure has become a small-world net- 
work 1^. 

The structure and properties of several social, biologi- 
cal and computational networks like the World-wide web 
(WWW) Q , network of the Internet structure Q , neu- 
ral networks collaboration network |^ etc. are being 



2.2 



2.0 



1.8 



1.6 



1.4 




10' 



10^ 



N 



FIG. I: The variation of the mean distance T>{N) of 25 
different subsets of IRN having different number of nodes (A'^). 
The whole range is fitted with a function like T>{N) = A -\- 
Blog(7V) where A k, 1.33 and B « 0.13. The inset shows the 
distribution Prob(£) of the shortest path lengths I on IRN. 
The lengths varied to a maximum of only five link lengths 
and the network has a mean distance ^{N) « 2.16. 



studied recently with much interest. In general a network 
has a number of 'nodes' and some 'links' connecting dif- 
ferent pairs of nodes. Typically the following quantities 
are defined to characterize a network of N nodes: (i) the 
diameter is the maximum distance between an arbitrary 
pair of nodes (ii) the clustering coefficient C{N) is the 
average fraction of connected triplets (iii) the probability 
distribution P{k) that an arbitrarily selected node has 
the degree k i.e. this node is linked to k other nodes. 

Watts and Strogatz |Q proposed a model of small- 
world network (SWN) in the context of various social 
and biological networks. They argued that SWNs must 
have small diameters which grow as InA'^ like random 
networks but should have large values of the clustering 
coefficients C{N) ^ 1 like regular networks. On the other 
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FIG. 2: Variation of the clustering coefflcient C{N) of 25 
different subsets of IRN having different number of nodes A^. 
Starting from a somewhat higher value at small number of 
nodes, the clustering coefficient decreases slowly on increasing 
and finally saturates at 0.69. 

hand the scale-free networks (SFN) are characterized by 
the power law decay of the degree distribution function: 
P{k) ~ k~''. It was observed later that the degree dis- 
tributions of nodes for two very important networks e.g., 
World-wide web which is a network of web-pages and 
the hyperlinks among various pages and the Internet net- 
work 1^ of routers or autonomous systems have scale- free 
property. Barabasi and Albert (BA) proposed a model 
for SFN which grows from an initial set of nodes and 
at every time step some additional nodes are introduced 
which are randomly connected to the previous nodes with 
the linear attachment probabilities . All scale- free net- 
works are believed to display small- world properties while 
a small-world network is not necessarily scale-free. 

Networks defined on the Euclidean space have also gen- 
erated much interests in recent times. Internet, trans- 
port systems, postal networks etc. are naturally defined 
on two-dimensional space. In these generalised networks 
the attachment probabilities depend jointly on the nodal 
degrees as well as the lengths of the links ^ . 

A railway network is one of the most important exam- 
ples of transport systems. The very complex topological 
structures of railway networks have attracted the atten- 
tion of researchers in many different contexts. For exam- 
ple the fractal nature of the structure of railway networks 
was studied by Benguigui Very recently the effi- 

ciency of Boston subway network has been studied where 
a new measure for such networks has been proposed . 

Our scheme is to associate first a representative graph 
Gjv with the IRN of N stations in the following way. Here 
the stations represent the 'nodes' of the graph, whereas 
two arbitrary stations are considered to be connected by 
a 'link' when there is at least one train which stops at 



FIG. 3: The variation of the clustering coefficient C{k) 
against the degree k for the IRN indicates a logarithmic decay 
for large k. 



both the stations. These two stations are considered to 
be at unit distance of separation irrespective of the geo- 
graphical distance between them. Therefore the shortest 
distance £ij between an arbitrary pair of stations Si and 
Sj is the minimum number of different trains one needs 
to board on to travel from Si to Sj . Thus £ij = 1 implies 
that there is at least one train which stops at both Si 
and Sj. Similarly, iij — 2 implies that there is no train 
which stops at both Si and Sj and one has to change the 
train at least once in some intermediate station to board 
the second train to reach Sj . With this definition, if the 
trains ii, t2, ■ ■ • tn pass through a station Si, then all the 
stations through which these n trains pass are unit dis- 
tance away from si and are considered as first neighbours 
of Si . Consequently, the number ki of such stations is the 
degree of the node Si. 

Indian Railway network is a densely populated net- 
work of more than 8000 stations where the number of 
trains plying in this network is of the order of 10000 
p^ . However, we collected the data of IRN on a coarse- 
grained level following the recent Indian Railways time 
table 'Trains at a Glance' containing the important 
trains and stations in India. This table contains a total of 
L — 579 trains covering N — 587 stations in a total of 86 
tables. A grand rectangular matrix Q{N,L) is then con- 
structed such that the ij-th element of this matrix is 1 if 
the train j stops at the station i, otherwise this element 
is zero. A second matrix T(0 : N, N) is also constructed 
where the degree ki of the station i is stored at the ele- 
ment T(0,z) and the serial numbers of the ki neighbours 
of i are stored at the locations T{j,i),j — l,ki, rest of 
the elements being zero. We define and estimate the fol- 
lowing quantities for the IRN. 

Since Gn is a connected graph, there are N{N — l)/2 
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FIG. 4: The cumulative degree distribution F{k) of the IRN 
with the degree k is plotted on the semi- logarithmic scale. 



distinct shortest paths among the TV stations. We cal- 
culate the probability distribution of the shortest path 
lengths Prob(^). The shortest path lengths are calculated 
using a burning algorithm and using the matrix T. 
In this algorithm the fire starts from an arbitrary node 
i, and burns this node at time t = 0. At time t ~ 1 
the fire burns all ki neighbours of i. At time t = 2 all 
unburnt neighbours of ki nodes are burnt and so on. The 
burning time of a node is the length of the shortest path 
of that node from the node i. This calculation has been 
repeated for all N nodes to get N{N — l)/2 shortest dis- 
tances. In Fig. 1 inset we plot this distribution which 
goes to a maximum of £ = 5 implying that one needs to 
change at most four trains to reach any station from any 
station in India on the coarse-grained level. Similarly the 
distribution has a peak at £ — 2 implying that one can 
go to the majority of stations in India by changing train 
only once. In the graph theory the diameter of a graph 
is measured by the maximum distance between the pairs 
of nodes. Therefore according to this definition the di- 
ameter of our network is exactly equal to 5. However the 
average shortest path between an arbitrarily selected pair 
of nodes which we call as the mean distance 'D{N) is also 
a measure of the topological size of the graph and have 
been used by many authors to measure the size of net- 
works as described in . We therefore measure the mean 
distance 'D{N) of the railway network of N stations as 
the average shortest distance {£ij) between an arbitrary 
pair of stations Si and Sj. We obtain V[N) k, 2.16 for 
this network. 

It is desirable to see how 'D{N) varies with N [ p^ . 
Since we have the data of a single railway network, we 
divide the whole IRN into 25 different subsets consist- 
ing of trains and stations of 10 different states, 7 differ- 
ent combinations of states, 7 different railway zones and 
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FIG. 5: Scaled probability distribution Dt{nt) for an arbi- 
trary station through which nt trains pass where {rit) ~ 12.06. 
Binned data is presented through the circles connected by 
lines which fits best to an exponential form: Dt{nt){nt) = 
aexp{—bx) with x = nt/{nt), a ^ 0.47 and b 0.75. 

the whole IRN. As a result we obtained 25 data points 
(though they are not necessarily non-overlapping sam- 
ples), reflecting the nature of variation of V{N) with N . 
In Fig. 1 we plot this data on a semi-log scale and though 
there is some wild fluctuations for small values of iV, for 
large values of A'^ the linear behaviour is quite apparent. 
The whole range is fitted with V{N) = A + B \og{N) 
where A w 1.33 and B 0.13. 

The clustering coefficient C{N) is defined in the fol- 
lowing way. Let the subgraph Gi consisting of the neigh- 
bours of Si i.e., (si, S2, sa, • ■ ■ : SfeJ have Ei links among 
them. Then the clustering co-efficient Ci of the node i is 
2i?i//ci(fci — 1) and that of the whole network is — (Ci). 
A direct measure of the clustering co-efficient of the whole 
IRN gives: C « 0.69 (Fig. 2). The high value of the clus- 
tering coefficient is explained in the following way. The 
number of stations in which a particular train stops 
are all at unit distance from one another on the net- 
work and therefore form an ng-clique. Therefore if only 
one train stops at some station i then Ci = 1. When 
two trains stop at the station i and the sets ns(l) and 
ns{2) of stations covered by these two trains are differ- 
ent, Ci is in general smaller than 1. However there may 
be other trains which do not stop at i but stop at the 
stations which are not in both ns{l) and ns(2). These 
trains enhance the value of Ci. The value of C « 0.69 is 
compared with a corresponding random graph network 
having the same number of vertices and edges as in IRN 
with the edges distributed randomly. It is found that the 
number of edges in IRN is 19603. If these edges are dis- 
tributed randomly within the maximum possible edges 
on a graph of A^=587 nodes the the clustering coefficient 
should be 19603/[N{N- l)/2] « 0.113 which is the same 
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as Prob(l). We also compute a modified clustering coef- 
ficient Co by counting in Ei only those links in the sub- 
graph Gi which pass through the node i. We obtained a 
value Co 0.55 for the IRN. 

Recently, the study of the clustering coefficient as a 
function of the degree of the node of some real-world 
network has shown an interesting feature C{k), de- 
fined as the clustering coeffcient of the node with degree 
fc, shows a decrease (apparently a power law decay) with 
k in several networks like the actor, language or world- 
wide-web networks. However in the network of internet 
at the router level or power grid network of the Western 
US, C{k) was found to be more or less a constant. In the 
IRN also, we find that C{k) (Fig. 3) remains at a constant 
value close to unity for small k and shows a logarithmic 
decay at larger values of k. In all these real- world net- 
works where C{k) remains more or less a constant, the 
nodes are linked by physical connections which may be 
responsible for this common feature. However, in this 
context it should also be mentioned that the scale-free 
Barabasi- Albert network |Q also predicts C(fc) oc fc" and 
C{N) oc In the IRN, although C{N) shows a de- 

crease with N , it is apparently much slower than a power 
law. 

The degree distribution of the network, that is, the 
distribution of the number of stations k which are con- 
nected by direct trains to an arbitrary station is denoted 
by P{k). We plot the cumulative degree distribution 
F{k) = P{k)dk using a semi-log scale in Fig. 4 for 
the whole IRN. We see that F{k) approximately fits to 
an exponentially decaying distribution F{k) ~ exp{~ak) 
with a= 0.0085. 

We also calculated the distribution D{nt) of the num- 
ber of trains rit which stop at an arbitrary station. This 
is plotted in Fig. 5 on a semi-log scale after scaling by the 
average number of trains {nt) « 12.06 along both the ab- 
scissa and the ordinate. The data is binned as before and 
is fitted to an exponential form: Dt{nt){nt) =aexp(— 6a;) 
with X — nt/{nt), a w 0.47 and b ~ 0.75. 

The distribution D{ns) of the number of stations 
through which an arbitrary train passes is plotted in Fig. 
6. The data is scaled by the average number of stations 
{us) « 12.37 along both the abscissa and the ordinate. 
The D{ns) grows very fast at the beginning, reaches a 
maximum and then decays to zero. A numerical fit to 
a functional form like Ds{ns){ns) = ax'^/{x'^ + b)^ with 
X = ns/{ns), a ~ 0.6 and b « 0.096 turns out to be 
reasonably good. 

We also measure the connectivity correlation of IRN 
following the works of Let F{k'\k) denote the con- 
ditional probability that a node of degree k has a neigh- 
bour of degree k'. Then to see how the nodes of differ- 
ent degrees are correlated we measure the average degree 
{knn{k)) = Y^k'k' F{k'\k) of the subset of nodes which are 
all neighbours to a particular node of degree k. In general 
this average has a variation like (fc„„(fc)) ~ k~'^ where 
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FIG. 6: Scaled probability distribution Ds{ns) for an arbi- 
trary train passing through ris stations where iris) ~ 12.37. 
Binned data is presented through the black dots connected by 
lines which fits best to the form: Ds{ns){ns) — ax*' /{x^ + hf' 
with X = Us/ (ria), a « 0.6 and b « 0.096. 



a non-zero v reflects a non-trivial correlation among the 
nodes of the network. We calculated (fc„„(/c)) for IRN 
and plotted it in Fig. 7 on a double logarithmic scale. 
Almost over a decade the (fcnn(^)) remains same on the 
average and is independent of k, indicating the absence 
of correlations among the nodes of different degrees. 

A more sensitive measure for the degree correlations 
was proposed in |jl^. Newman has defined a degree- 
degree correlation function r which measures whether a 
vertex of high degree at one end of a link prefers a vertex 
of high degree ( "assortative mixing" , r > 0) or low degree 
( "disassortative mixing" r < 0) at the other end. It has 
been observed that social networks are assortative and 
technological and biological networks are disassortative. 
We have measured for IRN the normalized correlation 
function following jl8| and found its values to be r = 
-0.033. This indicates that the IRN is of disassortative 
nature, i.e. rich vertices at one end of a link show some 
preference towards poor vertices at the other end, and 
vice versa. 

To summarize, we investigated the structural proper- 
ties of the Indian Railway network to see if some of the 
general scaling behaviour obtained for many complex net- 
works in recent times may also be present in IRN. While 
nodes of the network are evidently the stations, the links 
are defined as the pairs of stations communicated by sin- 
gle trains. With such a definition of link, the mean dis- 
tance of the network is a measure of how good is the 
connectivity of the network. Indeed, we observed that 
the mean distance of IRN varies logarithmically with the 
number of nodes with a high value of the clustering coef- 
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FIG. 7: The variation of the average degree {k„n{k)) of the 
neighbours of a node of degree k with k. After some ini- 
tial fluctuations, (fc„„(fc)) remains almost same over a decade 
around fc = 30 to 300 indicating absence of correlations among 
the nodes of different degrees. 



ficient. This implies that IRN behaves like a small- world 
network, which we believe should be typical of the rail- 
way network of any other country, which we are unable 
to study at present for unavailability of data. 
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